<!--This file created 5/13/98 2:23 PM by Claris Home Page version 2.0-->
<HTML>
<HEAD>
   <TITLE>Cryptanalysis of CipherSaber-1</TITLE>
   <META NAME=GENERATOR CONTENT="Claris Home Page 2.0">
   <X-SAS-WINDOW TOP=42 BOTTOM=589 LEFT=4 RIGHT=534>
   <X-SAS-REMOTESAVE SERVER="iecc.com" USER="arnold"
   DIR="ciphersaber/" FILE="">
</HEAD>
<BODY BGCOLOR="#FCFCD6">

<H1>A Cryptanalysis of CipherSaber-1</H1>

<P><A HREF="http://ciphersaber.gurus.com">CipherSaber 1</A> is a way
of using Ron Rivest's RC4 cipher as a practical symmetric key
cryptosystem. It is intended to demonstrate that strong cryptography
can be very simple to implement, and therefor impossible to suppress
without massive infringement on civil liberties. A CipherSaber-1
program can be written in 16 lines of QBasic. CipherSaber is
explained more fully at http://ciphersaber.gurus.com.</P>

<P>In CipherSaber-1, the ASCII representation of the user key or
passphrase is used directly as part of the RC4 key. To prevent the
same RC4 key from being used twice, a ten byte initialization vector
(IV) is appended to the user key to form the complete RC4 key. The IV
is stored as the first ten bytes of CipherSaber-1 encoded files.</P>

<P>This paper presents and analysis of several potential weaknesses
in CipherSaber-1 and recommendations for avoiding them. We use the
same notation here that Bruce Schneier uses in his description of RC4
in the second edition of his Applied Cryptography, except that the
stream of cipher bytes that RC4 produces are denoted as C(i).</P>

<H2>1. Key Size</H2>

<P>In most cryptosystems, the longer the key, the better. However, it
was pointed out by several individuals who looked at CipherSaber-1
that a very long user key could prevent the IV from affecting more
than a few of the S(i) values. This could cause leakage of plaintext.
To insure adequate mixing, we advised users to restrict their user
key length to 54 characters or less, as most would anyway. This
insures that the IV will be mixed in at least four times during the
setup process. Here is a more detailed justification of the 54
character limit.</P>

<P>We want to estimate the likelihood that initial cipher bytes will
be unaffected by the IV. Assume a user key length of 54 bytes, the
maximum length we recommend. Then all the S(i) after the first 54
will be affected by the IV during key setup. S(0) through S(53) will
only be affected by the IV if they are swapped in key setup rounds 55
through 256. That means there 202 opportunities. The probability that
a given S(i) will remain unswapped is</P>

<BLOCKQUOTE><P>(255/256)**202 = 0.45.</P></BLOCKQUOTE>

<P>So the expected number of the first 54 S(i) that are unaffected is
54*0.45 = 24.5. Thus the probability that S(i), where i is selected
at random from [0,256], has not been affected by the IV is 24.5/256 =
0.096</P>

<P>We can calculate the values that the first few S(i) are likely be
have after the key setup is complete if we assume that none of the
first few swaps hit an S(i) that has already been swapped previously.
Recall that at the start of key setup, S(i) = i, for all i.</P>

<P>Then:</P>

<BLOCKQUOTE><P>S(0) = K(0)</P>

<P>S(1) = K(0)+K(1)+1</P>

<P>S(2) = K(0)+K(1)+K(2)+3</P>

<P>...</P></BLOCKQUOTE>

<P>Call the successive cipher stream bytes C(i). If the first few
S(i) are not affected during the production of the first few cipher
bytes, then:</P>

<BLOCKQUOTE><P>C(0) = S(S(1) + S(S(1)))</P>

<P>C(1) = S(S(2) + S(S(1)+S(2)))</P>

<P>C(2) = S(S(3) + S(S(1)+S(2)+S(3)))</P>

<P>...</P>

<P>C(i) = S(S(i+1) + S(x)), where x = S(1)+S(2)+...+S(i+1) mod 256
</P></BLOCKQUOTE>

<P>Each S(i) for i between 0 and 53 have a 0.45 chance of being
unaffected by the IV. Therefore x has a 0.45**(i+1) chance of being
unaffected.</P>

<P>The values S(x) and S(S(i+1)+S(x)) each have a .096 chance of
being unaffected. Therefore C(i) has a chance p(i) of being
unaffected by the IV, where</P>

<BLOCKQUOTE><P>p(i) =0 .096*0.096*0.45**(i+1) = 0.0092*0.45**(i+1)
</P></BLOCKQUOTE>

<PRE>     i	1/p(i)
&nbsp;
     0	242
     1	539
     2	1198
     3	2662</PRE>

<H3>1.1 C(0) Analysis</H3>

<P>The most vulnerable Cipher byte would appear to be C(0). However
assume further that the first two characters of the user key are
7-bit ASCII and donot contain control characters. Thus they are in
the range [20, 7E] hex, i.e. [32, 126] decimal.</P>

<P>Recall that S(1) = K(0)+K(1)+1 and therefore is in the range
[65-253].</P>

<P>Also remember that: C(0) = S(S(1) + S(S(1)))</P>

<P>We see that the S(S(1)) term is guaranteed to select an S value
that was affected by the IV. So if the first two characters of the
user key are 7-bit ASCII, C(0) will always be afffeced by the IV.
</P>

<H3>1.2 C(1) Analysis</H3>

<P>Based on the above analysis, C(1) now appears to be the most
vulnerable cipher byte.</P>

<P>Remember that C(1) = S(S(2) + S(S(1)+S(2))) and S(1) = K(0)+K(1)+1
</P>

<P>If K(0) and K(1) are in the range [27, 100] decimal, then S(1)
must be in the range [55, 201] decimal. This means that S(2) and
S(1)+S(2) cannot both be in the range [0, 53] and C(1) is guaranteed
to be affected by the IV. The condition that K(0) and K(1) are in the
range [27, 100] decimal, which is [1B, 64] hex, is satisfied if the
first two characters in the key are digits or uppercase characters.
So if the first two characters of the user key are digits or
uppercase characters, C(1) will always be afffeced by the IV. Given
that the chance of C(1) being unaffected by the IV is 1/539 which is
less than the likelihood of any random 8-bit value (1/256), we have
not recommended that users restrict the first two characters of their
key to digits or uppercase.</P>

<H3>1.3 Effect of Varying Key Size</H3>

<P>For a 41 byte key (the size that insures 5 mixings), the C(1)
unaffected probability is computed as follows. The probability that
one of the first 41 S(i) being unaffected by the IV is</P>

<BLOCKQUOTE><P>(255/256)**(256-41)=.43</P></BLOCKQUOTE>

<P>Probability of a random S(i) being unaffected is 0.43*41/256 =
0.069.</P>

<P>Probability p(1) of C(1) being unaffected is (0.43*0.069)**2 =
1/1134. This is much less than the 1/256 likelihood of a random 8-bit
value.</P>

<P>For a 74 byte key (the size that insures 3 mixings), the
probability p(1) that C(1) is unaffected by the IV is as follows.
</P>

<P>The probability that one of the first 74 S(i) being unaffected by
the IV is:</P>

<BLOCKQUOTE><P>(255/256)**(256-74) = 0.49</P></BLOCKQUOTE>

<P>Probability of a random S(i) being unaffected is 0.49*74/256 =
.141</P>

<P>Probability of C(1) being unaffected is (0.49*0.141)**2 = 1/209.
This is greater than the 1/256 likelihood of a random 8-bit value,
justifying our recommendation not to use a key this big.</P>

<H2>2. Weak Keys</H2>

<P>On September 22, 1995, Andrew Roos posted a paper to
sci.crypt.research that presented a class of weak keys in RC4. The
conditions for this class are</P>

<BLOCKQUOTE><P>K(0) + K(1) = 0 mod 256.</P></BLOCKQUOTE>

<P>These weak keys can leak information about the first byte of the
key or cipher stream.</P>

<P>Note that if the K(0) and K(1) are 7-bit ASCII characters at least
one of which is not NUL (00), the above condition is never satisfied.
Thus the use of printable 7-bit ASCII keys avoids this class of weak
keys.</P>

<P>Several people have asked why the IV isn't prepended to the user
key. Doing so could permit weak keys to occur.</P>

<H2>3. IV Size and Randomness</H2>

<P>RC4 is a stream cipher. That means RC4 produces a stream of bits
that are xor'ed with the bits in the message. In a stream cipher it
is essential that the same key never be used twice.</P>

<P>If Sam the skilled snoop ever encounters two different messages
encrypted with the same RC4 key, then Sam can xor their cipher text
to eliminate the RC4 cipher bits because xor repeated twice is a null
operation. Sam will then have a stream of bits that represent the xor
of the plaintext of the two messages. If Sam knows the plaintext of
one of the messages, he can easily recover the other. Worse, if the
messages are in a natural language such as English, he can usually
recover both messages -- even if he knew neither. While revelation of
encoded messages is bad news for any cipher, in neither case is Sam
able to recover the user key.</P>

<P>CipherSaber addresses the requirement that the same key never be
used twice by appending a ten byte quantity, called the
initialization vector (IV), to the users key. If the same IV is never
used twice then the same RC4 key can never be used twice.</P>

<P>The IV values can be any unique 8-bit quantities. For example, one
could use the year, month, day, hour minute second and hundredth of a
second. As long as no two messages were sent at exactly the same
time, you would be safe.</P>

<P>Another way to get a high likelihood of uniqueness is to use
random values for the IV bytes. This also makes encrypted file look
like totally random byte strings, which may be useful in some cases.
</P>

<P>Since there are 80 bits in the IV, if n messages have been encoded
with the same user key, the probability q(n) that two messages will
have the same IV is</P>

<BLOCKQUOTE><P>q(n) = n(n-1)/(2**80)</P></BLOCKQUOTE>

<P>Even if one generated a billion messages, the probability that two
messages have the same IV is about one in a million. There is,
however a big "but" as we see in the next section.</P>

<H3>3.1 Use of Built-in Random Number Generators to Create the IV.
</H3>

<P>The "but" is that we are assuming all the IV bytes are random.
Creating random values on a computer can be tricky, In particular,
the built-in random number generators supplied in most programming
environments are notoriously bad.</P>

<P>Most C programming environments offer a 32 bit random number
generator. If such a generator is used to create an 80-bit IV there
is still at most only 32 bits of randomness. The probability q(n)
that two messages will have the same IV is then</P>

<BLOCKQUOTE><P>q(n) = n(n-1)/(2**32)</P></BLOCKQUOTE>

<P>If you want q(n) to be less than one in a thousand, you can only
send about 2000 messages with the same user key. To achieve even this
level of safety, you need to initialize the random number generator
with a seed that has 32 bits of randomness.</P>

<H3>3.2 Problems with QBasic as a Source of IV Randomness</H3>

<P>A more extreme example is Microsoft's QBasic 4.5. This Basic
interpreter is shipped ont the CD-ROM that comes with every copy of
Windows '95, making it undoubtedly the most widely distributed
programming environment in the world. QBasic is very well suited for
implementing RC4. It even has a SWAP statement. However, its random
number generator, RND, has only 24 bits of state. This means if you
want q(n) to be less than one in a hundred, you can only send about
400 messages with the same user key.</P>

<P>Even worse, the standard way to initialize QBasic's random number
generator is to put the statement RANDOMIZE TIMER at the beginning of
the program. TIMER is a built in function that returns a floating
point value containing the number of seconds since midnight. TIMER's
resolution appears to be 1/100 of a second. but the operating system
only supports resolution of 55 milliseconds. Assuming an eight hour
variability in time of encryption, TIMER only supplies about 19 bits
of randomness. This means if you want q(n) to be less than one in a
hundred, you can only send about 72 messages with the same user key.
</P>

<H3>3.3 Better Ways to Create the IV in QBasic</H3>

<P>Here are some methods for generating the CipherSaber IV in QBasic.
</P>

<P>The first is to xor the RND values with the characters returned by
the DATE$ function:</P>

<PRE><TT>     randomize timer
     for i = 1 to 10
          iv$(i)=chr$(asc(mid$(date$,i,1)) xor 256*rnd)
     next  i</TT></PRE>

<P>This method sharply reduces the likelihood that two messages will
have the same IV. However, while the encrypted file will look random
to a casual observer, an expert could detect that the IV was not
really random and might even be able to extract the date of the
message. In most cases this is not a concern since the date could be
determined by the message header or file directory block.</P>

<P>Here is a method that comes closer to creating a truly random IV.
It captures key stroke times while the user key (k$) is entered.</P>

<PRE><TT>     get: c$=inkey$
     if c$ = "" then goto get
     if c$ = CR$ then go to done
     i=i+1
     j=(i mod 10) + 1
     k$(i)=c$
     randomize timer
     iv$(j)=chr$((256*rnd + asc(iv$(j)) mod 256)
     goto get</TT>
&nbsp;</PRE>

<P>Even with DOS's low timing resolution, one can expect an 8 tick
variation per character entered, producing 3 bits of entropy. With a
15 character user key, this would produce about 45 bits of entropy.
The 18 to 20 bits of entropy represented by the first randomize timer
call would bring the IV close to 65 bits of entropy. Given that users
tend to enter random keys slowly, 4 bit of entropy per character is
more likely, bringing us close to a full 80 bits of randomness.</P>

<P>One disadvantage of the key stroke timing method is that you must
force the user to enter their key afresh for every message.</P>

<H2>4. CipherSaber-2</H2>

<P>To solve the problems caused by the single round of key setup in
RC4, I was originally going to include a CipherSaber-2 in my
proposal. CipherSaber-2 has a parameter N that specifies the number
of times the RC4 key setup loop is repeated. For N=1, CipherSaber-2
would just be the same as CipherSaber-1. The value of N could be
agreed upon by both users and kept secret along with the key. Indeed,
N can be considered part of the key. I see three advantages over RC4:
</P>

<P>1. For n &gt;= 2 it solves the mixing problem. User keys can be up
to 246 bytes without any problems.</P>

<P>2. It would coax users to have longer keys. A big weakness in
cypto-for-the-masses is that people just will not pick long enough
keys. Having a second "bucket" helps from a human factors perspective
(sort of like area code and 7-digit phone number vs. a straight
10-digit phone number)</P>

<P>3. Large values of N makes exhaustive key search much slower. A
thousand-fold increase in the time needed to test a key is equivalent
to adding 10 bits of entropy to the key. A CipherSaber-2 written in C
and running on a Pentium could probably stand a five or six digit N
without an unacceptable delay.</P>

<P>I did not include CipherSaber-2 in the initial announcement
because I wanted to keep the proposal simple and because I wanted to
use a cipher that was straight out of a textbook. Also it is
foolhardy to introduce a new cipher without a time for other people
to review it.</P>

<H2>5. Recommendations</H2>

<P>To use CipherSaber securely, we recommend that users:</P>

<UL>
   <LI>Use a high entropy key or passphrase. Using low entropy keys
   is the most common mistake users make A low entropy key will
   render any cryptosystem vulnerable. We recommend using a key
   consisting of 15 or more random characters, or 5 more short
   dictionary words. See http://www.hayom.com/diceware.html for a
   simple yet rigorous way to make high entropy passphrases.
   
   <LI>Keep keys 54 characters or shorter (most users would do this
   anyway)
   
   <LI>Use 7-bit printable characters, at least for the first few
   characters (most users would do this anyway as well)
   
   <LI>Use a strong method for generating the random IV, particularly
   if you plan to encode more than a few dozen messages with the same
   user key. See Section 3.3, above.
   
   <LI>For highest security, use CipherSaber-2, along with a strong
   source of random numbers.
</UL>

<P>
<HR>
I want to thank Hal Finney for his valuable comments on a much
earlier draft. Additional comments on this paper in general and on
CipherSaber-2 in particular are welcome.</P>

<P><A HREF="mailto:arnold@iecc.com">Arnold G. Reinhold</A></P>

<P>March 19, 1998 Rev f. HTML version, Basic and PC timer correction
and minor edits April 22, 1998, May 13, 1998</P>

<H4><A HREF="http://ciphersaber.gurus.com">Return to CipherSaber Home
Page</A></H4>

<P>&nbsp;</P>

<P>&nbsp;</P>

<P>&nbsp;</P>
</BODY>
</HTML>
